1. Data Exploration and Visualization

2. Linear Regression model Development: Creating linear regression to predict Estimated Shares Outstanding.

# Dependent Variable : "Estimated Shares Outstanding" is the dependent variable, meaning the model is trying to predict it based on other factors. # R squared (0.854) : This indicates that 85.4% of the variability in the dependent variable can be explained by the model. # F statistic (98.40) : A high F statistic value suggests that the model is statistically significant.

P-Value Analysis with the histogram P-Value Analysis and Histogram: Creating a histogram of the p-values. To check ,If there any skewedness?

The histogram shows three main groups of p-values: The first group is very close to zero. This suggests that some of the variables in the model are good at predicting the outcome since low p-values indicate significance. The second group is around the middle of the scale, near 0.4. This suggests that some variables may not be very good at predicting the outcome because their p-values are not low enough to be considered significant. The third group is close to 1. This suggests that some of the variables are not at all good at predicting the outcome, as high p-values indicate no significance. a bit left skewed

False Discovery Rate Control with BH Procedure Given the p values , use the BH procedure to control the FDR with a q of 0.1. How many “true” discoveries do you estimate?

6. Sensitivity Analysis of FDR Control: If you apply the BH procedure at different q values, how do the results change? What does this tell you about the robustness of your significant variables?

[a] Exploring Interaction Terms

Insights: High R-squared Value : The model has a high R-squared value (0.944), indicating that it explains a significant portion of the variance in the dependent variable. This suggests strong predictive power but also raises concerns about potential overfitting due to the model's complexity. Risk of Multicollinearity : The presence of a high condition number signals potential multicollinearity issues among the predictor variables. This can lead to unstable estimates of coefficients. Significant Variables : Some predictors show statistical significance (e.g., 'Capital Surplus', 'Intangible Assets'). These variables are likely to have a more substantial impact on the dependent variable.

7 [b] Briefly explain why interaction terms might be important in the context of predicting Estimated Shares Outstanding using fundamental financial metrics.

Evaluating the performance of this new model with interaction terms. Compare it with the performance of the original model without interaction terms using appropriate metrics.

""" 1. R-squared and Adjusted R-squared: Model with Interaction Terms (Model 2) : R-squared of 0.944 and Adjusted R-squared of 0.926. Model without Interaction Terms (Model 1) : R-squared of 0.854 and Adjusted R-squared of 0.846. 2. F-statistic: Model 2 : F-statistic of 52.58. Model 1 : F-statistic of 98.40 3. Model Complexity: Model 2 : Much more complex with 315 predictors. Model 1 : Simpler with 73 predictors 4. AIC Model 2 : AIC = 5.440e+04. Model 1 : AIC = 5.516e+04. """

Any significant changes in the model's performance or the coefficients of the predictors

Model 2 shows a higher R-squared and Adjusted R-squared, indicating it explains more variability of the dependent variable and has a better overall fit, even after adjusting for the number of predictors. Although Model 1 has a higher F-statistic, indicating a stronger overall statistical significance, the F-statistic must be interpreted in the context of the number of predictors. Model 2, despite a lower F-statistic, deals with a more complex model (more predictors). P-values of the Coefficients: Both Models : Contain a mix of significant and non-significant p-values. The significance of individual predictors should be assessed, especially in Model 2 where the interaction terms add complexity. Some coefficients might be significant in one model but not in the other, indicating the importance of interaction effects. While Model 2 might capture more nuanced relationships due to interaction terms, its complexity can make it harder to interpret and might risk overfitting. Model 2 has a lower AIC, suggesting it is a better model in terms of the trade-off between goodness of fit and complexity. Like AIC, lower BIC values indicate a better model, but BIC penalizes the number of parameters more heavily. Model 1 has a lower BIC, suggesting that when accounting for the higher penalty on model complexity, the original model without interaction terms might be more preferable.

FDR Analysis with Interaction Terms: Create a histogram of the p-values for the new model including interaction terms. Discuss any noticeable differences from the histogram you created for the original model

Compare these results with those obtained from the original model. Discuss the impact of including interaction terms on the number of discoveries and the control of the FDR